This is an interactive notebook. You can run it locally or use the links below:
Custom Routing for LLM Prompts with Not Diamond
This notebook demonstrates how to use Weave with Not Diamond’s custom routing to route LLM prompts to the most appropriate model based on evaluation results.Routing prompts
When building complex LLM workflows users may need to prompt different models according to accuracy, cost, or call latency. Users can use Not Diamond to route prompts in these workflows to the right model for their needs, helping maximize accuracy while saving on model costs. For any given distribution of data, rarely will one single model outperform every other model on every single query. By combining together multiple models into a “meta-model” that learns when to call each LLM, you can beat every individual model’s performance and even drive down costs and latency in the process.Custom routing
You need three things to train a custom router for your prompts:- A set of LLM prompts: Prompts must be strings and should be representative of the prompts used in our application.
- LLM responses: The responses from candidate LLMs for each input. Candidate LLMs can include both our supported LLMs and your own custom models.
- Evaluation scores for responses to the inputs from candidate LLMs: Scores are numbers, and can be any metric that fit your needs.
Setting up the training data
In practice, you will use your own Evaluations to train a custom router. For this example notebook, however, you will use LLM responses for the HumanEval dataset to train a custom router for coding tasks. We start by downloading the dataset we have prepared for this example, then parsing LLM responses into EvaluationResults for each model.Training a custom router
Now that you have EvaluationResults, you can train a custom router. Make sure you have created an account and generated an API key, then insert your API key below.


Evaluating your custom router
Once you have trained your custom router, you can evaluate either its- in-sample performance by submitting the training prompts, or
- out-of-sample performance by submitting new or held-out prompts
